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Abstract — We design and implement a network-coding-enabled 
reliability architecture for next generation wireless networks. Our 
network coding (NC) architecture uses a flexible thread-based 
design, with each encoder-decoder instance applying systematic 
intra-session random linear network coding as a packet erasure 
code at the IP layer, to ensure the fast and reliable transfer of 
information between wireless nodes. 

Using Global Environment for Network Innovations (GENI) 
WiMAX platforms, a series of point-to-point transmission exper- 
iments were conducted to compare the performance of the NC 
architecture to that of the Automatic Repeated reQuest (ARQ) 
and Hybrid ARQ (HARQ) mechanisms. At the application layer, 
Iperf and UDP-based File Transfer Protocol (UFTP) are used to 
measure throughput, packet loss and file transfer delay. In our 
selected scenarios, the proposed architecture is able to decrease 
packet loss from around 11-32% to nearly 0%; compared to 
HARQ and joint HARQ/ARQ mechanisms, the NC architecture 
offers up to 5.9 times gain in throughput and 5.5 times reduction 
in end-to-end file transfer delay. Our experiments show that 
network coding as a packet erasure code in the upper layers 
of the protocol stack has the potential to reduce the need for 
joint HARQ/ARQ schemes in the PHY/MAC layers, thus offering 
insights into cross-layer designs of efficient next generation 
wireless networks. 

Index Terms — ARQ, GENI, HARQ, Network Coding, WiMAX 

I. Introduction 

The growing market of mobile devices is placing increasing 
demands on wireless networks. Indeed, at the end of 2009, 
the number of mobile phone subscribers exceeded 4.6 billion 
worldwide JT], and the global mobile data traffic has been 
predicted to double every year through 2014 (2) . As a result, 
a crucial challenge for next generation wireless networks is to 
cope with the rapid increase in multimedia traffic with minimal 
impact on equipment complexity (2). 

Network Coding (NC) was originally proposed to maximize 
the capacity of a wired network ||3] |4). It enables nodes to 
combine or separate transient bits, packets, or flows through 
coding and decoding operations, in addition to storing and 
forwarding. In a wireless setting, NC adapts to the dynamics of 
the network topology, an essential feature of mobile wireless 
networks [5]. Numerous studies have shown that the use of 
NC in Wireless Local Area Networks (WLANs) significantly 



enhances network throughput, robustness, and security EE). 
Notably, 3-4x throughput gains were demonstrated experi- 
mentally in a WiFi context through the use of simple binary 
network codes [8|. Random Linear Network Coding (RLNC) 
[TO), where the NC coefficients are selected randomly 
over a given Galois field, has proven particularly effective in 
optimizing network resource consumption in WLANs ifTTlfTZl . 
NC may be applied across the OSI model [ 1 3 1 from the 
physical ifTTl to the network and application layers [14|. 

Despite the demonstrated effectiveness of NC in WLANs, 
NC for Wireless Metropolitan Area Networks (WMANs) 
has gained attention only recently, as the telecommunication 
industry moves toward next generation wireless networks 
such as 4G Worldwide Interoperability for Microwave Access 
(WiMAX) US) and 4G Long Term Evolution (LTE)- Advanced 
lITBI . 4G requires stationary speeds of 1 Gbps and mobile 
speeds of 100 Mbps, while 3G requires stationary speeds of 
2 Mbps and mobile speeds of 384 Kbps (T7). That is, 4G 
requires 500 and 260 times faster speeds than 3G in stationary 
and mobile cases, respectively. Thus, the need for low-cost 
performance-multiplying technologies such as NC is expected 
to become significant for WMANs in the near future. 

In this work, we design and implement an NC-enabled 
reliability architecture in a WiMAX platform provided by the 
Global Environment for Network Innovations (GENI) project. 
WiMAX offers robust, reliable, and cost-effective delivery of 
broadband services in metropolitan and rural areas fI31 . To 
alleviate the impact of wireless errors on network performance, 
WiMAX adopts two retransmission mechanisms: Automatic 
Repeated reQuest (ARQ) at the upper MAC layer, and Hybrid 
ARQ (HARQ) at the lower MAC and PHY layers. In the 
proposed NC architecture, instead of using either or both of 
these retransmission mechanisms, we apply systematic intra- 
session random linear network coding as a packet erasure code 
at the IP layer. In particular, we consider a flexible thread- 
based design, where parallel encoding-decoding instances are 
put in place to ensure reliability is achieved without incurring 
significant delay. 

The choice of the NC implementation layer is crucial. While 
NC is applicable across the OSI model, the choice of the 



convergence sublayer stands out for a number of reasons. 
First, additional performance gains at the physical layer are 
onerous, since existing coding schemes have achieved near- 
optimal efficiency levels. In contrast, NC may yield important 
gains when integrated within the transport and MAC sub- 
layers, as demonstrated by extensive recent studies l8l IT4l [T8l - 
l20l . In the context of WMANs, transport and MAC functions 
are performed at the convergence and MAC sub-layers. Now, 
the current context for higher Internet layers (i.e., TCP/IP) is 
extremely dynamic. This is essentially due to the sensitivity 
of TCP's congestion control to the variety of transmission 
environments (e.g., wireless, satellite, optical long-haul, etc.), 
leading to the emergence of a number of alternative competing 
transport protocols [21 22] and enhancements E3l . This trend 
is compounded by the emergence of IPv6. NC may therefore 
benefit from the continuity offered by industrial standards 
such as WiMAX and LTE: In the context of WMANs, the 
application of NC at the convergence sub-layer would serve 
all supported traffic and would be independent from likely 
technology and protocol shifts at higher layers. In this work, 
we resort to an IP-based implementation since the convergence 
sublayer is not accessible in the GENI platform. 

A series of point-to-point transmission experiments are 
conducted to compare the performance of our architecture to 
that of the HARQ and ARQ mechanisms. Since the GENI 
WiMAX base stations (BSs) only support chase combining 
(CC) HARQ, only CC-HARQ is considered in our study. 
At the application layer, Iperf and UDP-based File Transfer 
Protocol (UFTP) are used to measure throughput, the percent- 
age of packet loss, and file transfer delay. In our selected 
scenarios, the proposed architecture substantially decreases 
packet loss from around 11-32% to nearly 0%. Compared 
to the HARQ and joint HARQ/ARQ mechanisms, the NC 
architecture offers up to 5.9 times gain in throughput and 
5.5 times reduction in end-to-end file transfer delay. Our 
experimental setups were limited by our ability to access and 
configure the GENI WiMAX platform. Nonetheless, our initial 
assessment of the NC reliability architecture illustrates its 
potential advantages over the HARQ/ARQ scheme, and offers 
exciting opportunities for further investigation. 

One way of interpreting the potential advantages of the 
proposed NC architecture over HARQ/ARQ is to view the 
latter as an a posteriori repetition code adaptation mechanism, 
with rates determined by the number of reactive retransmis- 
sions for each unit of data. Since retransmissions are packet 
specific, the rate granularity is low, and the maximum rate 
is small. By comparison, NC formulates unique packets into 
equivalent degrees of freedom, offering three advantages as 
an code adaptation scheme. First, coded packets can be sent a 
priori, in expectation of packet losses, thus reducing the effect 
of large round trip times in ARQ. Second, each newly received 
degree of freedom can make up for any previously lost packet, 
thus leading to rate adaptation in steps of 1/block-size, where 
a block is the group of data packets coded together. Three, 
HARQ/ARQ relies heavily on the acknowledgment process, 
thus is prone to ACK7NACK errors, delays, and losses, which 



in turn can result in inefficient retransmission of correctly 
received packets. NC is less sensitive, since each transmitted 
coded packet is a new degree of freedom that can be useful 
in decoding. The combination of proactive transmissions, rate 
adaptation with a finer granularity, and robustness to ACK 
losses makes NC an efficient alternative reliability mechanism. 
It is also more in-line with the ever increasing speed and 
performance of a priori adaptative modulation and coding at 
the PHY layer. 

The remainder of this article is organized as follows. Section 
|H] is an overview of NC-based HARQ/ARQ alternatives and 
enhancements. Section [III] describes the NC-based reliability 
architecture. Section [TV] considers the experimental setup and 
introduces the performance metrics. Section IV] illustrates and 
discusses the main results. Finally, Section IVH concludes the 
paper. 

II. Related Work 

WiMAX is one of the two major WMAN standards |24l . 
along with LTE. It adopts two retransmission mechanisms 
for reliability: ARQ at the upper MAC layer and HARQ 
at the lower MAC and PHY layers. In ARQ, block re- 
transmissions are processed independently; in HAQR, For- 
ward Error Correction (FEC) and ARQ are combined, and 
subsequent retransmissions of a given information block are 
jointly processed with the original block. The two extensively 
investigated implementations of HARQ are Chase Combining 
(CC) and Incremental Redundancy (IR) J25). In WiMAX, both 
the HARQ and ARQ feature can be turned on, leading to a 
joing HARQ/ARQ setup. Observe that retransmissions under 
HARQ and/or ARQ require the reception of positive (ACK) or 
negative (NACK) acknowledgment messages for each block, 
hence compounding overhead. 

Network Coding (NC) is initially shown to be a capacity- 
achieving coding scheme for multicast in wired networks 
13] 0j. A number of analytical studies then propose NC 
as a stand-alone information-theoretic technique to improve 
throughput and retransmission efficiency in wireless networks. 
Lun et al. l26l l27l show that RLNC is capacity-achieving for 
multicast connections in packet erasure networks. Dana et al. 
Il28ll derive the capacity for a class of wireless erasure networks 
with no interference at reception and show that linear coding 
suffices to achieve the capacity region. Ghaderi et al. l29l 
analytically quantify the reliability gain of NC for wireless 
multicast and show that NC asymptotically achieves perfor- 
mances similar to that of rateless erasure coding. Sundararajan 
et al. J30) theoretically extend ARQ with RLNC (5][T0l, while 
Nguyen et al. 0T1 provide results comparing the bandwidth 
efficiency of RLNC to that of ARQ. In addition, Pu et al. 
Il32l develop an information-theoretic performance bound to 
predict the coding gains of CC-HARQ in broadcast settings. 

Algorithmic ally, the combination of packet retransmissions 
in single-hop multiple unicast and multicast settings is among 
the earliest and most widely studied schemes using NC as a 
wireless reliability mechanism. We call such a scheme retrans- 
mission coding. Jolfael et al. [33] apply XOR retransmission 



coding to ARQ in a point-to-multipoint connection over broad- 
cast links, while Yong et al. P4ll consider the multicast setting. 
Larsson et al. 11351 l36l study XOR retransmission coding for 
multi-user ARQ in multiple unicast settings and suggest that 
linear coding in larger fields may be used. Larsson et al. 11371 
also consider adaptive linear NC for ARQ in multicast settings, 
where coding coefficients are adaptively selected from a suf- 
ficiently large finite field. Note that earlier schemes closely 
related to NC exist. Metzner ||38l . for instance, presented a 
packet coding retransmission scheme for single-hop broadcast 
whereby the retransmitted frame is built through XORing the 
NACKed frames of various receivers. 

Recently, practical NC-based retransmission algorithms 
have also been proposed, often using simulations to measure 
performances. MAC RLNC (MRNC) (39] 001 uses RLNC at 
the MAC layer, where data blocks are segmented and coded 
together. N-in-1 NC |41 1 extends MRNC by coding over more 
than one block for retransmissions. The authors in |41 1 report a 
throughput gain of up to 106% over conventional CC-HARQ. 
In addition, Manssour et al. 11421 propose a retransmission 
scheme for wireless unicast using a combination of channel 
coding and NC, showing 68.75% throughput gains compared 
to CC-HARQ. Qureshi et al. (43) present BENEFIT, an 
efficient retransmission algorithm for single-hop wireless mul- 
ticast networks based on NC with conditional retransmission 
and reduced decoding times. While all previous contributions 
apply network coding digitally, SYNC P4l considers symbol- 
level network coding at the physical layer, thus making use of 
corrupted packets. 

Some studies also combine retransmission coding with 
HARQ (NC-HARQ) in single-hop wireless networks 03). 
NC-HARQ uses XOR retransmission coding in conjunction 
with FEC, thus, in effect, combining network and channel 
coding. Thobaben et al. [46 1 and Larsson et al. j47l consider 
NC-HARQ for multi-user HARQ in multiple unicast settings. 
Peng et al. [48 1 consider NC-HARQ in both broadcast and uni- 
cast scenarios. Tran et al. [49] extend NC-HARQ by adapting 
the amount of FEC in real time to channel conditions. This 
technique increases the throughput efficiency up to 3.5 times 
over ARQ and 1.5 times over HARQ. Zhang et al. l50l extend 
NC-HARQ by supplementing XOR retransmission coding 
with additional XOR operations, combining dynamically lost 
packets from the same receiver. Lu et al. lIBTI study NC-HARQ 
in the context of wireless video broadcast. Abuzeid et al. Il52l 
compare NC-HARQ and IR-HARQ in cooperative wireless 
communication systems. 

NC is further proposed in multi-hop and cooperative con- 
texts. Fan et al. Il53l . Sun et al. |54) and Vien et al. (55j 
consider a scenario where two nodes communicate with the 
BS with the assistance of a relay. Fan et al. Il53l introduce a 
NC-based cooperative multicast scheme while Sun et al. 11541 
discuss cooperative HARQ based on NC (C-HARQ-NC). Vien 
et al. [55 1 investigate ARQ based on NC for two-way wireless 
relay networks. Recently, Vien et al. also discuss NC-based 
block ARQ (BACK) for wireless relay networks 11561 . Hong 
et al. [57 1 propose NC-HARQ for mobile relay systems. 



In this paper, we are interested in the use of NC in a 
WiMAX setting. Past work in this area includes contributions 
by Jin et al. (39] 00) and Yazdi et al. |58), where NC 
is considered in conjunction with ARQ and/or HARQ in 
WiMAX. Jin et al. G5) introduce MRNC and report a 10% 
gain in throughput over HARQ in single-hop transmissions. 
The adaptive extension of MRNC ll40l outperforms regular 
MRNC by 28.4% and HARQ by 57.7% in terms of throughput. 
Adaptive MRNC uses the channel state information feedback 
to adjust dynamically packet size according to current channel 
conditions. Yazdi et al. [58 1 extend MRNC to restrict the num- 
ber of retransmissions to an upper bound which is important 
for delay sensitive applications. 

Despite the large number of studies covering NC as an 
alternative or enhancement to HARQ and ARQ mechanisms, 
they are limited to analysis and simulation, and none are 
supported by experimentation. To our knowledge, our work 
provides a first experimental implementation of NC as a 
throughput efficiency and reliability mechanism in a single- 
hop wireless link. After describing our NC-based design and 
implementation in the next sections, we compare its perfor- 
mance to that of ARQ and CC-HARQ in WiMAX. 

III. NC-Enhanced Architecture 

Our proposed NC-enabled reliability architecture is imple- 
mented in the form of an NC module at the IP layer of 
the network protocol stack, as shown in Fig. [T] A Linux 
packet filtering framework {netfilter) [59 1 intercepts, copies 
and forwards IP packets to the NC module, which in term 
injects processed packets back into the IP layer. 

The NC module is implemented in user-space. It acts as 
an encoder at the base station (BS), and as a decoder at 
the subscriber station (SS). At the BS, the source applica- 
tion, located in user-space, sends outgoing IP packets to the 
Operating System (OS) where the transport and IP layers 
are run. Netfilter intercepts those packets and sends them to 
the encoder NC module in user-space. The encoder returns 
coded IP packets to the OS. Coded IP packets then traverse 
the WiMAX stack, passing through the Convergence Sublayer 
(CS), the upper and lower MAC sublayers and the PHY layer. 
At the SS, netfilter intercepts the incoming coded IP packets 
handed from WiMAX to the OS and delivers them to the 
decoder NC module in user-space. The decoder sends decoded 
packets back to the OS, where they are forwarded to the 
destination application. In the NC-enhanced architecture, ARQ 



TABLE I: Design parameters for the NC Module. 



Parameter 


Description 


N p 


number of concurrent encoder-decoder thread pairs 


L t 


processing length threshold of the buffer list 


T, 


processing time interval of the buffer list 


Lm 


maximum length of segments 


N r 


preferred number of segments 


N k 


number of rounds of redundancy transmission 


N m 


number of redundancy packets per round 


T r 


time interval between each round 



(a) IP-based NC architecture 



TABLE II: Derived variables for the NC Module. 



Application layer 



Transport layer 



3 

J 



IP layer 



Convergence 
sublayer 



Upper MAC 
sublayer (ARQ) 



Lower MAC 
sublayer (H-ARQ) 



PHY layer 



Network coding 
module 



Netfilter 



S-WiMAX 



(b) NC module: encoder 



Encoder worker 
thread 1 



Encoder master 
thread 



Encoder worker 
thread 2 



Encoder worker 
thread Np 



(c) NC module: decoder 



Decoder worker 
thread 1 



Decoder worker 
thread 2 



Decoder worker 
thread Np 




Decoder master 
thread 



Fig. 1: (a) IP-based NC architecture, at the BS for encoding 
and at the SS for decoding: 1) IP packets are intercepted. 2) 
Netfilter copies and forwards IP packets to the NC module. 3) 
Processed packets are injected back, (b) NC module: encoder, 
an encoder master thread load-balances N p encoder worker 
threads in a round-robin fashion, (c) NC module: decoder, a 
decoder master thread and N p decoder worker threads. 



Variables 


Description 


L s 


calculated segment length 


N a 


calculated number of segments 


L 


total length of an outgoing IP packet 


L b 


coding block length 


Lp 


length of the current IP packet (temporary) 



A. Thread-Based Encoder and Decoder Design 

The NC module uses a flexible thread-based design, where 
parallel encoding-decoding instances are generated to process 
packets concurrently. Systematic intra-session RLNC is ap- 
plied. The encoder and decoder processes each have a master 
thread and N p worker threads, as shown in Fig. |T| (b) and (c). 
Each encoder-decoder thread pair operates independently from 
other pairs and is identified by a unique Thread ID (TID). The 
encoder master thread load-balances encoder worker threads 
by distributing incoming packets in a round-robin fashion. The 
decoder master thread dispatches incoming coded IP packets 
from encoder worker threads to the corresponding decoder 
worker threads according to their TID. Each worker thread 
in the encoder process matches a unique worker thread in the 
decoder process, as shown in Fig. [2] Next, we explain in more 
details, the encoding, decoding, and feedback mechanisms in 
a encoder-decoder thread pair. 



Encoder thread x 



— Coded IP packets - 
-ACK IP packet- 



Decoder thread x 



Fig. 2: Pair of encoder-decoder worker threads, exchanging 
coded IP packets and an ACK packet. 

1) Encoding Mechanism: Fig. [3] illustrates the encoding 
mechanism. Incoming IP packets are first buffered at the 
master thread, and stored successively as a buffer list. The 
master thread uses Alg. [T] to determine when the buffer list 
is handed to the next worker thread. At a worker thread, the 
buffer list is concatenated into a coding block. Next, the block 
is to be divided into segments, the basic unit of operation for 
the NC module. The number of segments {N s ) and segment 
length (L s ) are calculated according to Alg. [2] Byte padding 



and HARQ, run from the upper and lower MAC sublayer 
respectively, are switched off. 

Table [I] lists the key design parameters for the proposed 
NC module, while Table [II] lists variables derived from these 
module parameters. Exact definitions of these parameters and 
variables will be provided in subsequent subsections, where 
their importance will also be discussed. In the rest of this 
Section, we describe details of the NC module implementation, 
starting by defining the encoder and decoder processes given 
in Figures [T] (b) and (c). 



Algorithm 1 Determine when the master thread sends the 
buffer list to the next worker thread, Lb is the length of the 
buffer list. is the time threshold to concatenate the buffer 
list. L t is the maximum length of the buffer list. L p is the 
length of the current incoming IP packet. 



Initialize timer T 

Initialize length Lb of buffer list 

while T <Ti and L b < L t do 

Receive new packet with length L p 

Lb ^ Lb + L p 
end while 

Transfer buffer list to next worker thread 



1. Buffering 



IP packet 



IP packet 



IP packet 



2. Concatenation 



Coding buffer list 



IP packet 


IP packet 


IP packet 




3. Padding 




Coding block 




IP packet 


IP packet 


IP packet 


Padding 



Padded coding block 



4. Segmentation 



Segment 


Segment 


Segment 


Segment 


Segment 


5. Coding 


Ns 








Coded 
segment 


. la 


Segment 






1=1 

6. Encapsulation 

NC header NC header 


NC header 







Coded 




segment 





Coded 




segment 





Coded 




segment 



Coded IP packets 

Fig. 3: Successive steps of the encoder: 1) Incoming IP packets 
are buffered at the master thread, forming a coding buffer list. 
Alg. [T] determines when the buffer list is handed to a worker 
thread. 2) At each worker thread, the list is concatenated into 
a coding block. 3) Alg. [2] determines the number of segments 
(N s ) and segment length (L s ), and byte padding is added. 4) 
The block is segmented. 5) Algorithm[3]encodes the segments. 
6) Coded segments are encapsulated into coded IP packets. 



is applied so that the padded block is a multiple of N s . The 
block is then segmented and the resulting segments are coded 
according to Alg. [3] Finally, the encapsulation step adds a 
coding header, thus producing coded IP packets from the 
generated segments. Therefore, in our architecture, although 
segment size is constant for each block, it varies among blocks, 
depending on the traffic intensity. 



Algorithm 2 Determine the segment length L s and the number 
N s of segments, given the length L{, of the coding block, the 
maximum length L m of segments, and the preferred number 
N r of segments. 



Lb 
N r 

N,. 



L b <- L h + 1 
L H 

N s 

while L s > L m 

Ns^N s + 

end while 



> 1 byte for the padding boundary. 



do 

1 



Algorithm 3 Encode. N s , Nk, N m and T r are defined in 
Tables [I] and |TTJ Terminate immediately if an ACK for the 
same coding block is received. 

1: for x = 1 — > N s do > generate systematic code first. 
2: generate an uncoded segment. 

3: end for 

4: while ACK has not yet been received, do 

5: for y = 1 ->• N k do 

6: for z = 1 — s- N m do 

7: generate a coded segment. 

8: end for 

9: wait for duration T r 

10: > terminate if an ACK is received, 

li: end for 

12: end while 



More specifically, at the master thread, Alg. [T] determines 
when the buffer list is handed to the next worker thread by 
combining a timeout mechanism with a maximum size trigger 
for buffer list concatenation. Concatenation occurs before time 
interval Tj or buffer length threshold L t are reached. 

At each work thread, after concatenating a new buffer list, 
Alg. [2] determines N s and L s for the new coding block. Byte 
padding is added so that the padded block is a multiple of N s . 
We use the well-known ANSI X.923 byte padding algorithm 
l60l . In ANSI X.923, bytes filled with zeros are appended to 
the data and the last byte stores the number of padded bytes. 
Alg. [2] first adds 1 byte to the coding block length for the 
padding boundary; it then initializes the segment length L s to 
j^-, and the number of segments N s to the preferred number 
of segments N r . Next, L s and N s are adjusted to make L s 
less than or equal to the maximum length of segments L m . 

After padding, the coding block is segmented and coded 
according to Alg [5] We use systematic RLNC, which has 
been shown to offer advantages in term of decoding delay 
||6TI when compared to non-systematic network codes. With 
systematic coding, N s uncoded segments are first generated 
and sent, followed by coded segments generated with random 
coefficients. We refer to the uncoded segments as 'systematic,' 
and the coded segments as 'nonsystematic' Up to A/j. rounds 
of N m nonsystematic segments are transmitted. An inter-round 
pause of duration T r is implemented to allow other threads 
to process their blocks. Although a received ACK terminates 
the encoding process, it is important to note that the encoder 
does not require ACK packets to operate. Encoding may be 
terminated when the maximum number Nk of retransmission 
rounds is reached, thus protecting against inefficiencies due to 
ACK errors or losses. We implement RLNC in a Galois Field 
of size 2 8 , which is sufficient for practical applications I9l fl0l . 
Each coefficient is hence expressed in a single byte. 

Coded segments are encapsulated into coded IP packets. 
The structure of the NC header used during encapsulation is 
shown in Fig. [4] The NC header contains the IP header, Thread 
ID (TID), Block ID (BID), Segment ID (SID), number A^ s of 
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Fig. 4: Structure of the NC header. Possible modifications to 
this header are replacing the coefficients with a random seed 



and accounting for IP packet fragmentation (Section IH-B4 1 



segments, and coding coefficients. Segment length L s is not 
included because it can be derived from the packet length 
field in the IP header. TID identifies which thread the packet 
belongs to; BID identifies which block the packet belongs to 
within a given thread. For each thread, BID is incremented 
for every new coded block. SID keeps track of the number of 
segments generated in a particular block; it is incremented for 
generated packet. N s and coding coefficients are necessary for 
decoding, which we describe in the next section. 

2) Decoder Mechanism: At a decoder worker thread, op- 
erations described in Fig. [3] are reversed in order to recover 
the original IP packets. First, decapsulation strips off the NC 
header. For each reassembled coded block, received coded 
IP packets are decoded progressively using Gauss-Jordan 
elimination [62 1 based Alg. [4] Once the block is decoded, it is 
unpadded and the original uncoded IP packets are separated. 
Note that if a packet with a different BID from the current 
block arrives at a decoding worker thread before the current 
block is decoded, the decoder will drop the current block and 
start decoding the new block. 

3) Feedback Mechanism: Once a block is decoded, the de- 
coder worker thread sends an ACK packet to the corresponding 
encoder worker identified by its TID. Fig.|5]shows the structure 
of an ACK packet. If the encoder worker thread is still running 
Alg. [3] on the block with the same BID as that in the ACK 
packet, the worker will terminate the algorithm. 



Algorithm 4 Block decoding algorithm. M is the current 
coefficient matrix of incoming coded packets. M[r + 1] refers 
to row r + 1 of M. ranfc(M) is the rank of M. 











r <- 

for each incoming coded IP packet N p do 

M[r + 1] <— coefficients and segment of N p 
Gauss-Jordan elimination on (r + 1) x (N s + L s ) of 
M 

6: if ranfc(M) = r + 1 then 
r 4— r + 1 
if r — N s then 

done decoding 
end if 
end if 
end for 



IP header 



TID 



BID 



Fig. 5: Structure of an ACK packet 



B. Other Design Considerations 

1 ) Code Rate: The Code Rate (CR) of the presented design 
is defined as the ratio the number N s of segments, to the sum 
of N s and the redundancy segments: 



CR = 



(1) 



N s + N k x N m ' 

where Nk is the number of redundancy rounds, and N m is the 
number of redundancy segments transmitted per round. Note 
that this is an upper bound on the effective code rate, as an 
ACK may interrupt before the transmission of Nk rounds of 
N m redundancy segments is completed. 

2) IP Packet Fragmentation: Since systematic RLNC is 
used, blocks that cannot be decoded can still contain useful 
information, as some uncoded packets may be extracted. 
To determine where an IP packet starts in a segment, we 
implement an additional two-byte field in the NC header called 
start. The start field allows IP packet defragmentation at the 
decoder in the event of unsuccessful block decoding. 

3) Overhead Reduction through Random Seeds: Assuming 
one byte per coefficient, the total NC header length is Lh + 
N s , where Lh is the length of the NC header without coding 
coefficients. The NC header overhead ratio is therefore Lh + Ns , 
where L s is the segment length. If N s is 120, Lh is 24, and 
L s is 1400, the overhead is 10.29%. This overhead can be 
reduced in three ways: 1) by increasing L m , the maximum 
length of segments, thus increasing L s , 2) by reducing N s , and 
3) by sending a seed of a pseudo-random number generator 
instead of a coefficient vector ||39l l40l [63). In this work, we 
implement the third option. Using random seeds, the overhead 
becomes L ^ +g , where q is the size of the seed value, typically 
4 bytes. Using the previously assumed values of Lh and L s , 
the overhead is reduced to the lower value of 2%. 

In order to support random seeds, we add new fields to 
the NC header: type and either segment number (segri) or 
seed, type is used to distinguish whether a packet is coded or 
uncoded; The parameter segn is used in a systematic packet 
to specify the segment number; seed is used in a coded packet 
as a random seed. We use a simple pseudo-random number 
generator described in Alg. [5] 

4) Implemented NC Header: In our implementation, the 20- 
byte IPv4 header is augmented as follows. For all generated 
packets, one-byte fields representing TID, BID, SID, iV g , 
start and type are added. For systematic and coded packets, 
a one-byte segn segment number or a two-byte seed field 
is added, depending on whether the packet is systematic or 
coded, respectively. The IP checksum is recalculated for each 
generated packet according to RFC 1071 ll64ll . 



Algorithm 5 Gerhard's Generator: a is initialized to 1. Given 
the seed a, the function generates a pseudo-random number 

from 1 to lira. 

1: a 4- 1 

2: function RAND(Zim) 

3: a 4- (ax 32719 + 3) mod 32749 

4: return (a mod Zim) + 1 

5: end function 



TABLE IV: Available PHY transmission settings with result- 
ing Carrier to Interference plus Noise Ratio (CINR), Received 
Signal Strength Indication (RSSI) and Average Tx Power, 
measured at the SS. 



BS 


SS 


MCS 


Tx. Power 


CINR 


RSSI 


Tx. Power 


64 QAM CTC 1/2 


13 dBm 


13 dB 


-76 dBm 


-63 dBm 


64 QAM CTC 2/3 


17 dBm 


17 dB 


-76 dBm 


-63 dBm 


64 QAM CTC 3/4 


18 dBm 


18 dB 


-75 dBm 


-63 dBm 


64 QAM CTC 5/6 


20 dBm 


18 dB 


-73 dBm 


-63 dBm 



IV. Experimental Setup and Performance Metrics 



TABLE V: Effective PHY-layer data rates. 



The proposed architecture is implemented over a WiMAX 
IEEE-802.16 [65 1 downlink available through the Global 
Environment for Network Innovations (GENI) collaborative 
research framework |66|. Four fixed downlink modulation and 
coding schemes (MCSs) and transmission power levels are 
available at the BS. For each of those PHY layer settings, 
1 1 reliability configurations are run, including raw unreliable 
transmission (termed raw) and various ARQ, HARQ and NC 
arrangements. For each of the reliability configurations, two 
transmission trials are conducted through Iperf and UFTP, 
respectively. HARQ and ARQ configurations default to the 
GENI WiMAX stations are used, partly because such default 
settings establish a clear baseline for comparison, partly be- 
cause we had very limited ability to adjust the PHY and MAC 
layer parameters. Below we provide implementation details on 
the PHY, MAC, and reliability configurations. 

A. PHY- and MAC-Layer Settings 

At the physical layer, four fixed downlink MCSs and BS 
transmission power levels are available, with increasing PHY 
code rates and power levels. These are listed in Table IV 
along with Carrier to Interference plus Noise Ratio (CINR), 

TABLE III: PHY- and MAC-Layer BS Configuration. 





MCS 


PHY rates 


Uplink 


QPSK, 1/2 


1.344 


Downlink 


64 QAM, 1/2 
64 QAM, 2/3 
64 QAM, 3/4 
64 QAM, 5/6 


15.120 
20.160 
22.680 
25.200 



Parameters 


Value 


PHY 


OFDMA 


Frequency 


2.59 Mhz 


Bandwidth 


10 Mhz 


Duplexing mode 


TDD 


Frames per second 


200 (5 ms per frame) 


Power level 


Fixed (per configuration) 


Downlink Modulation Coding Scheme (MCS) 


Fixed (per configuration) 


Uplink Modulation Coding Scheme (MCS) 


Fixed at QPSK CTC 1/2 


HARQ_TYPE 


CC 


HARQ_MAX_UL_BURST 


1 


HARQ_MAX_DL_BURST 


1 


HARQ_UL_ACK_DELAY 


3 frames 


HARQ_DL_ACK_DELAY 


1 frame 


HARQ_PDU_SN 


ON 


HARQ_MAX_RETRANSMISSION 


4 


ARQ_RETRY_TIMEOUT 


100 ms 


ARQ_BLOCK_SIZE 


256 Bytes 


ARQ_WINDOW_SIZE 


1024 


ARQ_TX_ACK_DELAY 


ms 


ARQ_ACK_PROC_TIME 


ms 


ARQ_BLOCK_LIFETIME 


500 ms 


ARQ_DLV_ORDER 


ON 


ARQ_RX_PURGE_TIMEOUT 


500 ms 


ARQ_SYNC_LOSS_TIMEOUT 


1000 ms 



Received Signal Strength Indication (RSSI) and Average Tx 
Power, measured at the SS. 

For each PHY setting, we run a number of reliability con- 
figurations involving different NC, HARQ, and ARQ settings. 
The implemented BS PHY-layer parameters are shown in 



Table III Also shown are HARQ and ARQ parameters when 
both are turned on. These represent the default equipment con- 
figuration, whereby CC-HARQ is employed. The maximum 
number of HARQ UpLink (UL) bursts per frame is set to 
1. The maximum number of HARQ DownLink (DL) bursts 
per frame is set to 1. The frame offset between an UL burst 
with respect to its UL ACK is set to 3. The frame offset of 
the DL ACK is set to 1. In addition, PDU SN extended sub- 
header reordering is enabled and the maximum number of 
retransmissions is set to 4. 

For ARQ, we use the following default settings. The min- 
imum time interval a transmitter will wait before retrans- 
mission of an unacknowledged ARQ block is set to 100 
ms, where the interval starts at the last block transmission. 
ARQ block size is set to 256 bytes. The transmission win- 
dow size (i.e., number of queued ARQ ACK blocks at any 
given time) is set to 1024. ACK processing time is set 
to 0. The maximum time interval an ARQ block will be 
managed by the transmitter ARQ state machine, once initial 
transmission of the block has occurred, is set to 500 ms. If 
transmission (or subsequent retransmission) of the block is 
not acknowledged by the receiver before this time limit is 
reached, the block is discarded. In-order delivery is enabled. 
The time interval the receiver will wait after successful re- 
ception of a block that does not result in advancement of 
ARQ_RX_WINDOW_START value is set to 500 ms. Lastly, 
the maximum time interval ARQ_TX_WINDOW_START or 
ARQ_RX_WINDOW_START parameters can stay at the same 
value before declaring a loss of synchronization between 
transmitter and receiver is set to 1000 ms. 

It is important to note that, owing to the fixed 10 Mhz 
channel bandwidth, the available PHY settings of Table IV 
yield the PHY-layer data rates shown in Table [V] [15], where 



TABLE VI: Reliability Configurations. 



TABLE VIII: Code Rate (CR) for NC Configurations. 



Configuration 


ARQ 


HARQ 


NC 


Raw 


OFF 


OFF 


OFF 




HARQ 


OFF 


ON 


OFF 




HARQ-ARQ 


ON 


ON 


OFF 




NC-10 


OFF 


OFF 


ON, N m 


= 10 


NC-15 


OFF 


OFF 


ON, N m 


= 15 


NC-20 


OFF 


OFF 


ON, Nm 


= 20 


NC-24 


OFF 


OFF 


ON, N m 


= 24 


NC-30 


OFF 


OFF 


ON, N m 


= 30 


NC-40 


OFF 


OFF 


ON, Nm 


= 40 


NC-60 


OFF 


OFF 


ON, N m 


= 60 


NC-120 


OFF 


OFF 


ON, N m 


= 120 



the data rates relevant to our experiments are highlighted. 

B. Reliability Configurations 

For each PHY setting, the 1 1 tested reliability configurations 
are shown in Table VI where N m is the number of redundancy 



packets per round in NC. 

We set NC parameters to the simple configurations summa- 
rized in Table VII A single thread (N p — 1) is implemented, 
and a single redundancy round (Nk — 1) of N m packets is 
transmitted immediately (T r = 0) after the block. The process- 
ing length threshold (L t ) and processing time interval (Ti) of 
the buffer list are set to 22400 bytes and 1 s, respectively. 
Finally, The maximum segment length (L m ) and preferred 
number of segments (i.e., initial block size, N r ) are set to 
1400 bytes and 120 segments, respectively. 

For each NC configuration, the approximate NC Code Rate 
(CR), defined in Section |III-B1| is calculated and shown in 
Table [Vm] For instance, in NC-10, the block size N r is 120 
and N m is 10. Therefore, 130 packets are sent per block, 
achieving a CR of 12/13. In theory, for best performance, the 
CR will match the raw throughput percentage. 

C. Transmission Trials 

In our experiments, UDP traffic is used as an emulation 
of real-time traffic. For each PHY setting and reliability 
configuration, two transmission trials are conducted through 
Iperf and UFTP, respectively. Iperf l67l is an application-layer 
network performance tool capable of creating UDP streams for 
throughput measurements, UFTP ll68l is a UDP-based FTP 
application. Both our measurement tools thus deploy UDP as 
the underlying transport protocol. In both the Iperf and UFTP 
trials, an application-layer load of 6 Mbps is offered at a fixed 
1400-byte packet-size. Each individual Iperf trial is terminated 
after a fixed duration of 60 seconds, whereas the UFTP 



TABLE VII: NC parameters. 



Parameters 


Value 


N p 


1 


N k 


1 


T r 


ns 


Tr 


1 s 


Lt 


22400 bytes 


Tm 


1400 bytes 


N r 


120 


N m 


Follows NC configuration index 



NC Configuration 


Code Rate (CR) 


NC-10 


12/13 = 0.92 


NC-15 


8/9 = 0.89 


NC-20 


6/7 = 0.86 


NC-24 


5/6 = 0.83 


NC-30 


4/5 = 0.80 


NC-40 


3/4 = 0.75 


NC-60 


2/3 = 0.67 


NC-120 


1/2 = 0.50 



transmissions are run until a 50 MByte file is successfully 
transferred. Note that the offered load of 6 Mbps is well 
below the effective downlink PHY-layer data rates shown in 
Table [Vj Note that the measured losses are observed at the 
application layer (through Iperf), while lower-layer statistics 
are not available in our experiments. 

D. Performance Metrics 

For each reliability configuration, we report the following 
performance metrics. 

1) Downlink Iperf loss percentage: the percentage of pack- 
ets lost over the total number of packets sent by Iperf, at the 
application layer, over the duration of the experiment. 

2) Downlink Iperf throughput, loss and redundancy band- 
width: the throughput is the number of packets successfully 
received by Iperf over the duration of the experiment, at 
the application layer. Two related values are the bandwidth 
loss and the redundancy bandwidth. The loss is calculated 
by subtracting the throughput from the offered load. The 
redundancy bandwidth is the additional bandwidth used be- 
yond the offered load for the propose of redundancy. In 
the raw case, the redundancy bandwidth is 0. For HARQ 
and HARQ-ARQ, since performance measurements are not 
available within the WiMAX stack, we assume a best-case 
scenario where redundancy bandwidth is also 0. For NC, we 
simply approximate the redundancy bandwidth as 

N m 

X o, 

N r 

where N m is the number of redundancy packets per round, 
N r is the preferred number of segments and o is the offered 
load. Note that for exact calculations, the computed number 
of segments N s should replace N r , and the actual redundancy 
bandwidth should include the NC header overhead. 

3) Downlink Iperf Throughput to Loss plus Redundancy 
Ratio (TLR): TLR is calculated as 

T 

TLR = 



L + R' 

where T is the throughput, L is the lost bandwidth and R 
is the redundancy bandwidth. An efficient scheme should give 
high throughput while keeping lost and redundancy bandwidth 
low. Thus, TLR is a measure of efficiency. 

4) Downlink UFTP file transfer delay: in UFTP, a file 
is divided and packetized into UDP packets of a specified 
length ll68l . The transmitter sends the packets; the receiver 
responds with NACKs for missing packets; the transmitter then 
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Fig. 6: Downlink throughput, lost bandwidth and redundancy 
bandwidth for 64 QAM CTC 5/6 at 20 dBm under an offered 
load of 6 Mbps (Iperf measurements averaged over 60 s). 




Fig. 7: Downlink Throughput to Loss plus Redundancy Ratio 
(TLR) for 64 QAM CTC 5/6 at 20 dBm under an offered load 
of 6 Mbps (Iperf measurements averaged over 60 s). 



resends the missing packets. File transfer is completed when 
the transmitter receives no NACKs from the receiver. 

V. Results and Discussion 

In this section, we first present the results of all reliability 
configurations for one of the PHY settings. We then summarize 
and discuss the results for all PHY settings. 

A. Case-Study: 64 QAM CTC 5/6 at 20 dBm 

The MCS and power level considered in this section is 
64 QAM CTC 5/6 at 20 dBm. As shown in Table |IVJ this 
PHY setting yields SS-measured CINR, RSSI and Average Tx 
Power values of 18 dB, -73 dBm and -63 dBm, respectively. 

The histogram of Fig. [6] combines throughput, lost band- 
width and redundancy bandwidth results for each of the relia- 
bility configurations, listed on the horizontal scale. Throughput 
is represented in dark-gray at the base of the histogram 
columns, whereas loss is represented in light-gray above 
throughput. Redundancy bandwidth is shown at the top of the 
NC columns. No redundancy bandwidth is shown for HARQ 
and ARQ since relevant MAC- and PHY-layer information is 



unavailable. Quantities represented in Fig. [6] are averaged over 
a duration of 60 s through Iperf, at the application layer. 

Observe that some of the NC configurations yield significant 
throughput increases compared to the raw, HARQ and HARQ- 
ARQ configurations. The highest throughput is achieved by 
NC-40 and is at least 40% above any of the non-NC con- 
figurations. Throughput of the NC configurations increases 
steadily with the redundancy level until it reaches 90% of 
the offered load (6 Mbps). As expected, packet losses also 
decrease for rising values of N m . Among all the configurations 
tested in this PHY setting, NC-10 exhibits the highest packet 
loss while NC-40 has the lowest. Moreover, the reduction in 
packet losses of HARQ- ARQ compared to HARQ ( 12%) 
illustrates potential benefits of ARQ under this PHY setting. 

Fig. [6] shows that all the NC configurations except NC-10 
improve throughput and reduce loss compared to the raw case. 
In contrast, the throughput and loss performance achieved by 
HARQ and HARQ-ARQ are slightly below raw. We provide 



some possible reapons for this observation in Section V-C 
Also keep in mind that the raw throughput is raw unreliable 
throughput, whereas HARQ and ARQ throughput represent 
reliable in-order packet flows. 

Fig. [7] shows TLR measured through Iperf. For the NC 
configurations, TLR values exhibit a maximum around N m — 
30, with levels increasing for N m < 30 and decreasing for 
N m > 30. Hence, an optimal redundancy value that achieves 
the highest throughput using minimal redundancy (redundancy 
bandwidth) exists at around N m = 30. Note that although high 
throughput levels are achieved for all N m < 40 (see Fig. [SJ, 
none are as efficient as NC-30. 

The lowest throughput and TLR occurs at N m = 10. At 
low redundancy values, N m is not sufficient to compensate 
for channel losses. As a consequence, blocks get discarded at 
the transmitter before a sufficient number of coded segments is 
received. At excessive redundancy values, on the other hand, 
the increase in overhead consumes valuable channel resources 
from data transmission, reducing efficiency and throughput. 
The described tradeoff is clear in the TLR profile of Fig. [7] 
Note that the configurations with the highest TLR, NC with 
24 < N m < 40, outperform the raw configuration, which 
transmits no redundancy packets. 

Fig. [8] shows the file transfer delay for all tested configu- 
rations at this PHY setting. The delay profile is the inverse 
of the TLR profile, with a minimum around NC-40. With a 
46% reduction from that of raw, NC-40 provides the best delay 



performance. As mentioned in Section IV-D4 UFTP is capable 
of completing the file transfer over an unreliable link (e.g., raw 
and NC-10 configurations) by resorting to its own reliability 
mechanism at the application layer. 

B. Summary of Results 

In this section, we compare the results of the raw, HARQ 
and HARQ-ARQ configurations with the best NC configu- 
ration at all four PHY settings. The best NC configuration, 
termed NC-Best in the result figures, is the one yielding the 
highest performance for any given measured metric. 



■ File Transfer Delay {s} 



HARQ 
-ARQ 



1 1 1 1 1 1 1 

IC-15 NC-20 NC-24 NC-30 NC-40 NC-60 12Q 
22.03 110.02 98.03 81.02 76.06 80.03 98.0; 



Fig. 8: Downlink file transfer delay for 64 QAM CTC 5/6 at 
20 dBm over an offered load of 6 Mbps (UFTP measurements 
for 50 MB file). 
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Fig. 9: Average Downlink Loss (%) Comparison under an 
offered load of 6 Mbps (Iperf measurements over 60 s) 
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Fig. 10: Downlink Throughput (Mbps) Comparison under an 
offered load of 6 Mbps (Iperf measurements over 60 s) 
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Fig. 11: Throughput to Loss plus Redundancy Ratio (TLR) 
Comparison under an offered load of 6 Mbps (Iperf - 60 s). 



Fig. [9] shows the average downlink loss percentages for all 
four PHY settings. As expected, as code rate increases, the raw 
loss percentage increases. However, the loss percentage for 
HARQ and HARQ-ARQ are higher for lower code rates. The 
use of ARQ, in particular, increases losses significantly (15%- 
25%) under the three PHY settings with the lower rates. In 
contrast, the best NC configuration keeps the loss percentage 
close to 0% for all PHY settings. 



Fig. 10 shows the average downlink throughput for all four 
PHY settings under the offered load of 6 Mbps. These results 
mirror the loss percentage results: as PHY code rate increases, 
raw throughput decreases, whereas HARQ and HARQ-ARQ 
throughputs increase. The inefficiency of HARQ/ARQ at low 
PHY code rates may be due to the lower data rate available for 
downlink retransmissions, as shown in Table [V] The best NC 
configuration keeps throughput close to the full offered load of 
6 Mbps. Therefore, in addition to introducing a high level of 
reliability, NC is capable of multiplying the raw throughput by 
up to 1.4 times. More significantly, it multiplies the throughput 
of HARQ and HARQ-ARQ by up to 3.0 and 5.9, respectively. 



leading to decreasing raw TLR levels. The NC configurations 
exhibit a similar decreasing profile. Although NC removes 
losses seen in the raw configuration almost entirely through 
redundancy, it remains more efficient than raw for all PHY 
settings. As in the case of loss levels, TLR levels increase with 
higher PHY code rates for HARQ and HARQ-ARQ. Despite 
ignoring any potential redundancy bandwidth in HARQ and 
HARQ-ARQ, those remain less efficient than NC. 



Fig. 12 shows the downlink file transfer delay (s) for all 



Fig. 1 1 depicts the TLR for all four PHY settings under the 
offered load of 6 Mbps. As code rate increases, losses grow, 



PHY settings. Observe that as PHY code rate increases, raw 
delay tends to increase. It is important to note that the delay 
figures demonstrated in this work apply to best effort (BE) 
traffic flows. In HARQ and HARQ-ARQ, the delay tends 
to decrease, thus confirming the higher efficiency of those 
reliability configurations at higher PHY code rates and CINR 
levels. Owing to its lower packet losses, NC maintains the 
lowest transfer delay of all the tested configurations, with a 
delay around 70 s. NC reduces the file transfer delay by 1.9 
times compared to raw, 2.8 times compared to HARQ and 5.5 
times compared to HARQ-ARQ. 




64 QAM 3/4 at 18 dBm 
I 64 QAM 5/6 at 20 dBm 



133.63 
164.68 



Fig. 12: Downlink file transfer delay (s) Comparison under an 
offered load of 6 Mbps (UFTP - 50 MB file). 



C. Discussion 

The trend across different PHY settings is consistent: NC 
configurations use the redundancy bandwidth to increase 
throughput and reduce losses significantly. In contrast, HARQ 
and HARQ-ARQ reduce throughput and increase losses, par- 
ticularly at lower PHY code rates. The loss percentage graph 
of Fig. [9] shows that NC works well as a packet erasure code. 

The amount of loss reduction and throughput gains of 
NC configurations depend on the number N m of redundancy 
packets per round. First, from the results shown in Section 
V-A we can see that a large N m may not be necessary, while 
a small N m may not be sufficient. When N m is too small, 
most coded blocks cannot be recovered, incurring additional 
loss, reducing throughput and increasing file transfer delay. 
When N m is too large, redundant packets become overheads, 
leading to possible buffer overflows. Intuitively, the optimal 
N m should be at a level that makes the resulting NC Code Rate 
(CR) match the raw throughput percentage, i.e., by sending 
an appropriate amount of redundancy a priori with the NC 
scheme, the raw unreliable throughput is fully utilized while 
reliability is achieved. 



13 shows that the raw 



Indeed, Fig. 

throughput percentage closely matches the CRs of the NC- 
Best cases (dashed line). Also observe that the throughput 
percentages of HARQ and HARQ-ARQ show large gaps when 
compared to the best NC schemes, particularly at low PHY 
code rates. 

At a load of 6 Mbps, HARQ and ARQ do not perform well. 
Compared to raw, HARQ and HARQ-ARQ show additional 
losses, reduced throughput and increased delays. Their perfor- 
mances improve as PHY code rates increase, although they do 
not outperform NC in our experiments. The low performance 
of HARQ and ARQ may be due to faulty implementation 
or non-optimal default parameters (e.g., delay timeouts, maxi- 
mum number of retransmissions). These conclusions, however, 
require further access to the used equipment to be verified. 

Our experimental results suggest that NC has a potential to 
replace HARQ and ARQ in future wireless network design. 
We infer that there are three main reasons why NC outper- 
forms HARQ and ARQ. First, NC requires less reliance on 
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Fig. 13: The throughput percentage of Raw, HARQ and 
HARQ-ARQ, NC-Best compared to the CR of the NC-Best 
(the NC configuration with the highest throughput). 



ACK packets. In HARQ and ARQ, since the transmitter has 
to wait for an ACK or NACK packet in each transmission, 
performance depends on the Round Trip Time (RTT). The 
longer the RTT, the lower the expected overall performance. 
In other words, the RTT limits the throughput of HARQ and 
ARQ. In contrast, in NC, additional degrees of freedom (coded 
packets) can be sent proactively ahead of time and lost packets 
can be recovered from successfully received coded packets. 
Hence, the RTT does not limit the overall performance. 

Second, the low throughput of HARQ and HARQ-ARQ 
suggests that they generate a large amount of overhead. In our 
experiments, NC overhead reaches 100% (N r = 120) whereas 
for HARQ, each packet may be retransmitted up to 4 times, as 
given by HARQ_MAX_RETRANSMISSION, thus potentially 
incurring 400% overhead. Thus load increases may reach or 
exceed the supported PHY data rates (see Table |V)). The excess 
load may cause buffer overflows at the MAC layer, resulting 
in a drop in throughput. 

Third, and most importantly, in HARQ and ARQ, each 
additional redundant packet can only compensate for a par- 
ticular lost packet, while in NC, each additional coded packet 
can compensate for any lost packet in the block. In HARQ 
and ARQ, a particular packet is deemed lost if it is not 
successfully received after a fixed number of retransmissions. 
In NC, however, any lost packet can be replaced by the next 
coded packet. Therefore, NC is more robust to lost packets. It 
is also less sensitive to lost ACK packets, since it requires at 
most one ACK packet per block. 

VI. Conclusions 

This work proposes and demonstrates a network-coding 
(NC)-enabled reliability architecture for next generation wire- 
less networks. In our design, NC is used as a packet erasure 
code providing resilience against errors below the IP layer. 
We validate our design through an experimental case study 
at a GENI WiMAX site, where we compare our NC archi- 
tecture to default HARQ and ARQ in terms of packet loss, 
throughput and file transfer delay. We demonstrate that NC 



is potentially superior as a packet erasure code. Compared to 
HARQ and ARQ, NC potentially offers a gain of 5.9 times 
in throughput and a reduction of 5.5 times in file transfer 
delay. Our experimental setups were limited by our ability 
to access and configure the GENI WiMAX platform. Owing 
to its flexibility and simplicity, we believe that the proposed 
NC architecture may become instrumental in providing faster 
and more efficient next generation wireless network services 
through low-cost upgrades. 

This initial architectural design opens up a number of new 
and exciting venues for future investigation. Immediate follow- 
ups may investigate the performance sensitivity to different 
offered loads. The experimentation could also be extended to 
investigate various parameters of the proposed design, such 
as the numbers of redundancy transmission rounds (JVfc) or 
concurrent encoder-decoder thread pairs (N p ). In addition, 
wider access to the wireless communication equipment (i.e., 
WiMAX BS and SS, in our case) would enable a more 
complete study, encompassing features such as signal-to-noise 
ratio (SNR) and power control, HARQ and ARQ fine-tuning, 
operation under adaptive modulation and coding (AMC), and 
mobility. The optimization of the decoding time is also an 
interesting direction to pursue. Different decoding algorithms 
such as the Jacobi iterative method for finite field matrix 
inversion may be considered. 

Ultimately, the extension of our design to an adaptive 
scheme that dynamically adjusts various design parameters 
using information available through ACKs or through channel 
quality information is an important design goal. In addition, 
the joint optimization of rate and power control under NC 
would be a valuable next step. Furthermore, the study may be 
broadened to mobile SSs, multiple-hop topologies and traffic- 
dependent coding interfaces. 

We believe that the integration of NC within the WMAN 
protocol stack, namely at the convergence sublayer, will 
yield additional advantage for both upper and lower layers. 
For instance, it may alleviate PHY-layer BER constraints 
or improve responsiveness to channel conditions for various 
supported traffic flows. Although NC will require upgrades 
at all participating base and subscriber stations, the involved 
software and protocol upgrades are minor and well within 
the reach of operators. Besides, once installed, they may be 
exploited for service differentiation. 
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